Founded by Benjamin Franklin in 1740 The Institute For Research In Cognitive Science

نویسنده

  • Benjamin Franklin
چکیده

This document describes a sizable grammar of English written in the TAG formalism and implemented for use with the XTAG system. This report and the grammar described herein supersedes the TAG grammar described in [Abeill e et al., 1990]. The English grammar described in this report is based on the TAG formalism developed in [Joshi et al., 1975], which has been extended to include lexicalization ([Schabes et al., 1988]), and uni cation-based feature structures ([Vijay-Shanker and Joshi, 1991]). The grammar discussed in this report extends the grammar presented in [Abeill e et al., 1990] in at least two ways. First, this grammar has more detailed linguistic analyses, and second, the grammar presented in this paper is fully implemented. The range of syntactic phenomena that can be handled is large and includes auxiliaries (including inversion), copula, raising and small clause constructions, topicalization, relative clauses, in nitives, gerunds, passives, adjuncts, it-clefts, wh-clefts, PRO constructions, noun-noun modi cations, extraposition, determiner phrases, genitives, negation, noun-verb contractions, sentential adjuncts and imperatives. The XTAG grammar has been relatively stable since November 1993, although new analyses are still being added periodically. Acknowledgements We are immensely grateful to Aravind Joshi for supporting this project. The principal authors of the current document are: Tilman Becker, Christine Doran, Dania Egedi, Beth Ann Hockey, Seth Kulick and B. Srinivas. We are also grateful to Rajesh Bhatt for editorial help. The following people have contributed to the development of grammars in the project: Anne Abeille, Rajesh Bhatt, Kathleen Bishop, Sharon Cote, Beatrice Daille, Christine Doran, Dania Egedi, Jason Frank, Caroline Heycock, Beth Ann Hockey, Daniel Karp, Seth Kulick, YoungSuk Lee, Patrick Martin, Megan Moser, Sabine Petillon, Yves Schabes, Victoria Tredinnick and Ra aella Zanuttini. The XTAG system has been developed by: Tilman Becker, Richard Billington, Andrew Chalnick, Dania Egedi, Devtosh Khare, Albert Lee, David Magerman, Alex Mallet, Patrick Paroubek, Rich Pito, Gilles Prigent, Anoop Sarkar, Yves Schabes, B. Srinivas, Yuji Yoshiie and Martin Zaidel. We would also like to thank Michael Hegarty, Lauri Karttunen, Anthony Kroch, Mitchell Marcus, Martha Palmer, Owen Rambow, Phillip Resnik, Beatrice Santorini and Mark Steedman. In addition, Je Aaronson, Douglas DeCarlo, Mark-Jason Dominus, Mark Foster, Gaylord Holder, David Mageman, Ken Noble, Steven Shapiro and Ira Winston have provided technical support. Adminstrative support was provided by Carolyn Elken, Jodi Kerper, Christine Sandy and Trisha Yannuzzi. This work was partially supported by ARO grant DAAL03-89-0031, ARPA grant N0001490-J-1863, NSF STC grant DIR-8920230, and Ben Franklin Partnership Program (PA) grant 93S.3078C-6. Part I General Information 1 Chapter 1 Getting Around This technical report presents the English XTAG grammar as implemented by the XTAG Research Group at the University of Pennsylvania. The technical report is organized into four parts, plus a set of appendices. Part 1 contains general information about the XTAG system and some of the underlying mechanisms that help shape the grammar. Chapter 2 contains an introduction to the formalism behind the grammar and parser, while Chapter 3 contains information about the entire XTAG system. Linguists interested solely in the grammar of the XTAG system may safely skip Chapters 2 and 3. Chapter 4 contains information on some of the linguistic principles that underlie the XTAG grammar, including the distinction between complements and adjuncts, and how case is handled. The actual description of the grammar begins with Part 2, and is contained in the following three parts. Parts 2 and 3 contains information on the verb classes and the types of trees allowed within the verb classes, respectively, while Part 4 contains information on trees not included in the verb classes (e.g. NP's, PP's, various modi ers, etc). Chapter 5 of Part 2 contains a table that attempts to provide an overview of the verb classes and tree types by providing a graphical indication of which tree types are allowed in which verb classes. This has been cross-indexed to tree gures shown in the tech report. Chapter 6 contains an overview of all of the verb classes in the XTAG grammar. The rest of Part 2 contains more details on several of the more interesting verb classes, including ergatives, sentential subjects, sentential complements, small classes, ditransitives, and it-clefts. Part 3 contains information on some of the tree types that are available within the verb classes. These tree types correspond to what would be transformations in a movement based approach. Not all of these types of trees are contained in all of the verb classes. The table (previously mentioned) in Part 2 contains a list of the tree types and indicates which verb classes each occurs in. Part 4 focuses on the non-verb class trees in the grammar. NP's and determiners are presented in Chapter 18, while the various modi er trees are presented in Chapter 19. Auxiliary verbs, which are classed separate from the verb classes, are presented in Chapter 20, while certain types of conjunction are shown in Chapter 21. Other conjunctions, such as subordinating and discourse conjunction, are considered tree types, and as such are included in the chapter on adjunct clauses (section 15.2). Sentential complements of NP's and PP's are discussed in section 8.8. Throughout the technical report, mention is occasionally made of changes or analyses that 3 CHAPTER 1. GETTING AROUND we hope to incorporate in the future. Appendix A details a list of these and other future work. The appendices also contain information on some of the nitty gritty details of the XTAG grammar, including the tree naming conventions (Appendix B), and a comprehensive list of the features used in the grammar (Appendix C). Appendix D contains an evaluation of the XTAG grammar, including comparisons with other wide coverage grammars. 4 Chapter 2 Feature-Based, Lexicalized Tree Adjoining Grammars The English grammar described in this report is based on the TAG formalism ([Joshi et al., 1975]), which has been extended to include lexicalization ([Schabes et al., 1988]), and uni cation-based feature structures ([Vijay-Shanker and Joshi, 1991]). Tree Adjoining Languages (TALs) fall into the class of mildly context-sensitive languages, and as such are more powerful than context free languages. The TAG formalism in general, and lexicalized TAGs in particular, are well-suited for linguistic applications. As rst shown by [Joshi, 1985] and [Kroch and Joshi, 1987], the properties of TAGs permit us to encapsulate diverse syntactic phenomena in a very natural way. For example, TAG's extended domain of locality and its factoring of recursion from local dependencies lead, among other things, to a localization of so-called unbounded dependencies. 2.1 TAG formalism The primitive elements of the standard TAG formalism are known as elementary trees. Elementary trees are of two types: initial trees and auxiliary trees (see Figure 2.1). In describing natural language, initial trees are minimal linguistic structures that contain no recursion, i.e. trees containing the phrasal structure of simple sentences, NP's, PP's, and so forth. Initial trees are characterized by the following: 1) all internal nodes are labeled by non-terminals, 2) all leaf nodes are labeled by terminals, or by non-terminal nodes marked for substitution. An initial tree is called an X-type initial tree if its root is labeled with type X. Recursive structures are represented by auxiliary trees, which represent constituents that are adjuncts to basic structures (e.g. adverbials). Auxiliary trees are characterized as follows: 1) all internal nodes are labeled by non-terminals, 2) all leaf nodes are labeled by terminals, or by non-terminal nodes marked for substitution, except for exactly one non-terminal node, called the foot node, which can only be used to adjoin the tree to another node1, 3) the foot node has the same label as the root node of the tree. 1A null adjunction constraint (NA) is systematically put on the foot node of an auxiliary tree. This disallows adjunction of a tree onto the foot node itself. 5 CHAPTER 2. FEATURE-BASED, LEXICALIZED TREE ADJOINING GRAMMARS X X X Initial Tree: Auxiliary Tree: Figure 2.1: Elementary trees in TAG There are two operations de ned in the TAG formalism, substitution2 and adjunction. In the substitution operation, the root node on an initial tree is merged into a non-terminal leaf node marked for substitution in another initial tree, producing a new tree. The root node and the substitution node must have the same name. Figure 2.2 shows two initial trees and the tree resulting from the substitution of one tree into the other. 2 2 Y Y X => 1 X Y Figure 2.2: Substitution in TAG In an adjunction operation, an auxiliary tree is grafted onto a non-terminal node anywhere in an initial tree. The root and foot nodes of the auxiliary tree must match the node at which the auxiliary tree adjoins. Figure 2.3 shows an auxiliary tree and an initial tree, and the tree resulting from an adjunction operation. A TAG G is a collection of nite initial trees, I , and auxiliary trees, A. The tree set of a TAG G, T (G) is de ned to be the set of all derived trees starting from S-type initial trees in I whose frontier consists of terminal nodes (all substitution nodes having been lled). The string language generated by a TAG, L(G), is de ned to be the set of all terminal strings on the frontier of the trees in T (G). 2Technically, substitution is a specialized version of adjunction, but it is useful to make a distinction between the two. 6 3 X 2 2 Y Y Y Y 3 1 X Y => Figure 2.3: Adjunction in TAG 2.2 Lexicalization `Lexicalized' grammars systematically associate each elementary structure with a lexical anchor. This means that in each structure there is a lexical item that is realized. It does not mean simply adding feature structures (such as head) and uni cation equations to the rules of the formalism. These resultant elementary structures specify extended domains of locality (as compared to CFGs) over which constraints can be stated. Following [Schabes et al., 1988] we say that a grammar is lexicalized if it consists of 1) a nite set of structures each associated with a lexical item, and 2) an operation or operations for composing the structures. Each lexical item will be called the anchor of the corresponding structure, which de nes the domain of locality over which constraints are speci ed. Note then, that constraints are local with respect to their anchor. Not every grammar is in a lexicalized form.3 In the process of lexicalizing a grammar, the lexicalized grammar is required to be strongly equivalent to the original grammar, i.e. it must produce not only the same language, but the same structures or tree set as well. NP N John Sr NP0↓ VP V walked VPr VP* NA PP P to NP↓ NP N Philadelphia (a) (b) (c) (d) Figure 2.4: Lexicalized Elementary trees 3Notice the similarity of the de nition of a lexicalized grammar with the o line parsability constraint ([Kaplan and Bresnan, 1983]). As consequences of our de nition, each structure has at least one lexical item (its anchor) attached to it and all sentences are nitely ambiguous. 7 CHAPTER 2. FEATURE-BASED, LEXICALIZED TREE ADJOINING GRAMMARS In Figure 2.4, which shows sample initial and auxiliary trees, substitution sites are marked by a #, and foot nodes are marked by an . This notation is standard and is followed in the rest of this report. 2.3 Uni cation-based features In a uni cation framework, a feature structure is associated with each node in an elementary tree. This feature structure contains information about how the node interacts with other nodes in the tree. It consists of a top part, which generally contains information relating to the supernode, and a bottom part, which generally contains information relating to the subnode. Substitution nodes, however, have only the top features, since the tree substituting in logically carries the bottom features. Y tr br X Y t U tr br X Y t => Figure 2.5: Substitution in FB-LTAG The notions of substitution and adjunction must be augmented to t within this new framework. The feature structure of a new node created by substitution inherits the union of the features of the original nodes. The top feature of the new node is the union of the top features of the two original nodes, while the bottom feature of the new node is simply the bottom feature of the top node of the substituting tree (since the substitution node has no bottom feature). Figure 2.54 shows this more clearly. Adjunction is only slightly more complicated. The node being adjoined into splits, and its top feature uni es with the top feature of the root adjoining node, while its bottom feature unies with the bottom feature of the foot adjoining node. Again, this is easier shown graphically, as in Figure 2.65. The embedding of the TAG formalism in a uni cation framework allows us to dynamically specify local constraints that would have otherwise had to have been made statically within the trees. Constraints that verbs make on their complements, for instance, can be implemented through the feature structures. The notions of Obligatory and Selective Adjunction, crucial 4abbreviations in the gure: t=top feature structure, tr=top feature structure of the root, br=bottom feature structure of the root, U=uni cation 5abbreviations in the gure: t=top feature structure, b=bottom feature structure, tr=top feature structure of the root, br=bottom feature structure of the root, tf=top feature structure of the foot, bf=bottom feature structure of the foot, U=uni cation 8 X > t Y b Y Y* br tf bf tr X t U tr b U bf tf br Y Y Figure 2.6: Adjunction in FB-LTAG to the formation of lexicalized grammars, can also be handled through the use of features.6 Perhaps more important to developing a grammar, though, is that the trees can serve as a schemata to be instantiated with lexical-speci c features when an anchor is associated with the tree. To illustrate this, Figure 2.7 shows the same tree lexicalized with two di erent verbs, each of which instantiates the features of the tree according to its lexical selectional restrictions. In Figure 2.7, the lexical item thinks takes an indicative sentential complement, as in the sentence John thinks that Mary loves Sally. Want takes a sentential complement as well, but an in nitive one, as in John wants to love Mary. This distinction is easily captured in the features and passed to other nodes to constrain which trees this tree can adjoin into, both cutting down the number of separate trees needed and enforcing conceptual Selective Adjunctions (SA). 6The remaining constraint, Null Adjunction (NA), must still be speci ed directly on a node. 9 CHAPTER 2. FEATURE-BASED, LEXICALIZED TREE ADJOINING GRAMMARS Sr assign-comp : inf_nil/ind_nil displ-const : set1 : tense : <1> assign-case : <2> agr : <3> assign-comp : <4> mode : <5> inv : comp : nil displ-const : set1 : <6> wh : <7> extracted : NP0↓ case : <2> agr : <3> wh : <7> VP assign-case : <2> agr : <3> tense : <1> assign-comp : <4> mode : <5> displ-const : set1 : <6> mainv : <8> tense : <9> mode : <10> assign-comp : <11> assign-case : <12> agr : <13> passive : <14> displ-const : set1 : V mainv : <8> tense : <9> mode : <10> assign-comp : <11> assign-case : <12> agr : <13> passive : <14> mode : ind tense : pres mainv : assign-comp : ind_nil/that/rel/if/whether assign-case : nom agr : 3rdsing : + num : sing pers : 3 thinks S1* displ-const : set1 : assign-comp : inf_nil/ind_nil inv : comp : that/whether/if/nil mode : ind/sbjnct Sr assign-comp : inf_nil/ind_nil displ-const : set1 : tense : <1> assign-case : <2> agr : <3> assign-comp : <4> mode : <5> inv : comp : nil displ-const : set1 : <6> wh : <7> extracted : NP0↓ case : <2> agr : <3> wh : <7> VP assign-case : <2> agr : <3> tense : <1> assign-comp : <4> mode : <5> displ-const : set1 : <6> mainv : <8> tense : <9> mode : <10> assign-comp : <11> assign-case : <12> agr : <13> passive : <14> displ-const : set1 : V mainv : <8> tense : <9> mode : <10> assign-comp : <11> assign-case : <12> agr : <13> passive : <14> mode : ind tense : pres mainv : assign-comp : ind_nil/that/rel/if/whether assign-case : nom agr : 3rdsing : + num : sing pers : 3 wants S1* displ-const : set1 : assign-comp : inf_nil/ind_nil inv : comp : whether/for/nil mode : inf think tree want tree Figure 2.7: Lexicalized Elementary Trees with Features 10 Chapter 3 Overview of the XTAG System This section is derived in large part from the XTAG project notes ([Doran et al., 1994]). An additional section on Corpus Parsing and Evaluation has not been replicated here (but see Appendix D). This section focuses on the various components that comprise the parser and English grammar in the XTAG system. Persons interested only in the linguistic analyses in the grammar may skip this section without loss of continuity, although we may occasionally refer back to the various components mentioned here. 3.1 System Description Figure 3.1 shows the overall ow of the system when parsing a sentence. The input sentence is submitted to the Morphological Analyzer and the Tagger. The morphological analyzer retrieves the morphological information for each individual word in the sentence from theMorphological Database. The result is ltered in the P.O.S. Blender using the output of the Trigram Tagger to reduce the part of speech ambiguity of the words. The augmented sentence, with each word annotated with part of speech tags and morphological information, is input to the Parser, which then consults the Syntactic Database and the Tree Database to retrieve the appropriate tree structures for each word in the sentence. Information from the Statistical Database, along with a variety of heuristics, is used to reduce the number of trees selected. The parser then composes the structures to obtain the parse(s) of the sentence. 3.2 Morphological Analyzer The morphology data was originally extracted from the Collins English Dictionary ([Hanks, 1979]) and Oxford Advanced Learner's Dictionary ([Hornby, 1974]) available through ACLDCI ([Liberman, 1989]), and then cleaned up and augmented by hand ([Karp et al., 1992]). The database consists of approximately 317,000 in ected items, along with their root forms and in ectional information (such as case, number, tense). Thirteen di erent parts of speech are di erentiated: Noun, Proper Noun, Pronoun, Verb, Verb Particle, Adverb, Adjective, Preposition, Complementizer, Determiner, Conjunction, Interjection, and Noun/Verb Contraction. Nouns and Verbs are the largest categories, with approximately 213,000 and 46,500 in ected forms respectively. This information is maintained in database form for quick access. Retrieval 11 CHAPTER 3. OVERVIEW OF THE XTAG SYSTEM Derivation Structure Input Sentence Morph Analyzer Parser Morph DB Tree Grafting Tree Selection Syn DB Trees DB Stat DB P.O.S Blender Tagger Stat DB Figure 3.1: Overview of the XTAG system time for a given in ected entry is approximately 0.6 msec. 3.3 Part of Speech Tagger A trigram part of speech tagger ([Church, 1988]), trained on the Wall Street Journal Corpus, is incorporated in XTAG. The trigram tagger has been extended to output the N-best parts of speech sequences ([Soong and Huang, 1990]). XTAG uses this information to reduce the number of specious parses by ltering the possible parts of speech provided by the morphological analyzer for each word. When the correct part of speech sequence is returned, the time required to parse a sentence decreases by an average of 93%. 3.4 Parser XTAG uses an Earley-style parser which has been extended to handle feature structures associated with trees ([Schabes, 1990]). The parser uses a general two-pass parsing strategy for lexicalized grammars ([Schabes et al., 1988]). In the tree-selection pass, the parser uses the syntactic database entry for each lexical item in the sentence to select a set of elementary structures from the tree database. The tree-grafting pass composes the selected trees using substitution and adjunction operations to obtain the parse of the sentence. The output of the parser for the sentence I had a map yesterday is illustrated in Figure 3.2. The parse tree1 represents the surface constituent structure, while the derivation tree represents the derivation history of the parse. The nodes of the derivation tree are the tree names anchored by the lexical items2. The composition operation is indicated by the nature of the arcs: a dashed line is used for 1The feature structures associated with each note of the parse tree are not shown here. 2Appendix B explains the conventions used in naming the trees. 12 substitution and a bold line for adjunction. The number beside each tree name is the address of the node at which the operation took place. The derivation tree can also be interpreted as a dependency graph with unlabeled arcs between words of the sentence. Sr NP N I VPr VP NA V had NP DetP D a N map Ad yesterday nx0Vnx1[had] αNXN[I] (1) βvxARB[yesterday] (2) αNXdxN[map] (2.2) αDXD[a] (1) Parse Tree Derivation Tree Figure 3.2: Output Structures from the Parser Additional methods that take advantage of FB-LTAGs have been implemented to improve the performance of the parser. For instance, the span3 of the tree and the position of the anchor in the tree are used to weed out unsuitable trees in the rst pass of the parser. Statistical information about the usage frequency of the trees has been acquired by parsing corpora. This information has been compiled into a statistical database that is used by the parser. These methods speed the runtime by approximately 87%. 3.5 Syntactic Database The syntactic database associates lexical items with the appropriate trees and tree families based on various selectional information. The syntactic database entries were originally extracted from the Oxford Advanced Learner's Dictionary ([Hornby, 1974]) and Oxford Dictionary for Contemporary Idiomatic English ([Cowie and Mackin, 1975]) available through ACL-DCI ([Liberman, 1989]), and then modi ed and augmented by hand ([Egedi and Martin, 1994]). There are more than 37,000 syntactic database entries. Selected entries from this database are shown in Table 3.1. Each syntactic entry consists of an index eld, the unin ected form under which the entry is compiled in the database, an entry eld, which contains all of the lexical items that will anchor the associated tree(s), a pos eld, which gives the part of speech for the lexical item(s) in the entry eld, and then either (but not both) a trees or fam eld. The trees eld indicates a list of individual trees to be associated with the entry, while the fam eld indicates a list of tree families. A tree family may contain a number of trees. A syntactic entry may also contain a list of feature templates (fs) which expand out to feature equations to be placed in the speci ed tree(s). Any number of ex elds may be provided for example sentences. Note 3The span of a tree is the number of terminals and non-terminals along its frontier. 13 CHAPTER 3. OVERVIEW OF THE XTAG SYSTEM that lexical items may have more than one entry and may select the same tree more than once, using di erent features to capture lexical idiosyncrasies. INDEX: have/26 ENTRY: have POS: V TREES: Vvx FS: #VPr ind, #VPr past, #VPr perfect+, #VP ppart, #VP passEX: he had died; we had died INDEX: have/50 ENTRY: have POS: V TREES: Vvx FS: #VP inf EX: John has to go to the store. INDEX: have/69 ENTRY: NP0 have NP1 POS: NP0 V NP1 FAM: Tnx0Vnx1 FS: #TRANS+ EX: John has a problem. INDEX: map/1 ENTRY: NP0 map out NP1 POS: NP0 V PL NP1 FAM: Tnx0Vplnx1 INDEX: map/3 ENTRY: map POS: N TREES: N, NXdxN, Nn FS: #N wh-, #N re INDEX: map/4 ENTRY: map POS: N TREES: NXN FS: #N wh-, #N re -, #N plur Table 3.1: Example Syntactic Database Entries The syntactic database is currently undergoing a series of changes designed to make it easier to use and update. In addition, the number of entries will be augmented to increase the coverage of the database, and the defaults used by the XTAG system will be accessible from 14 the database itself. The format of the entries as seen in Table 3.1 will change slightly in the new version, but the same basic information will be included. 3.6 Tree Database Trees in the English XTAG grammar fall into two conceptual classes. The smaller class consists of individual trees such as NP and adverb trees. The trees in this class are generally anchored by non-verbal lexical items. The larger class consists of trees that are grouped into tree families. These tree families represent subcategorization frames (see section 4.1). As of the end of 1994, there are 569 trees that compose 38 tree families, along with 67 individually selected trees in the tree database. 3.7 Statistics Database The statistics database contains tree unigram frequencies which have been collected by parsing the Wall Street Journal, IBM manual, and ATIS corpora using the XTAG English grammar. The parser, augmented with the statistics database, assigns each word of the input sentence the top three most frequently used trees given the part of speech of the word. On failure the parser retries using all the trees suggested by the syntactic database for each word. The augmented parser has been observed to have a success rate of 50% without retries. 3.8 X-Interface In addition to the parser and English grammar, XTAG provides a graphical interface for manipulating TAGs. The interface o ers the following: Menu-based facility for creating and modifying tree les and loading grammar les. User controlled parser parameters, including the parsing of categories (S, embedded S, NP, DetP), and the use of the tagger (on/o /retry on failure). Storage/retrieval facilities for elementary and parsed trees as text les. The production of postscript les corresponding to elementary and parsed trees. Graphical displays of tree and feature data structures, including a scroll `web' for large tree structures. Mouse-based tree editor for creating and modifying trees and feature structures. Hand combination of trees by adjunction or substitution for use in diagnosing grammar problems. 15 CHAPTER 3. OVERVIEW OF THE XTAG SYSTEM 3.9 Computer Platform XTAG was developed on the Sun SPARC station series, and has been tested on the Sun 4 and HP BOBCATs series 9000. It is available through anonymous ftp, and requires 20MB of disk space. Please send mail to [email protected] for ftp instructions or more information. XTAG requires the following software to run: A machine running UNIX and X11R4 (or higher). Previous releases of X will not work. X11R4 is free software available from MIT. A Common Lisp compiler which supports the latest de nition of Common Lisp (Steele's Common Lisp, second edition). XTAG has been tested with Lucid Common Lisp 4.0 and Allegro 4.0.1. CLX version 4 or higher. CLX is the lisp equivalent to the Xlib package written in C. Mark Kantrowitz's Lisp Utilities from CMU: logical-pathnames and defsystem. The latest version of CLX (R5.0) and the CMU Lisp Utilities are provided in our ftp directory for your convenience. However, we ask that you refer to the appropriate source for updates. The morphology database component ([Karp et al., 1992]), no longer under licensing restrictions, is available as a separate system from the XTAG system. FTP instructions and more information can be obtained by mailing requests to [email protected]. The syntactic database component is also available as a separate system ([Egedi and Martin, 1994]). The new format of the database is expected to be available in 1995. FTP instructions and more information can be obtained by mailing requests to [email protected]. 16 Chapter 4 Underview The morphology, syntactic, and tree databases together comprise the English grammar. A lexical item that is not in the databases receives a default tree selection and features for its part of speech and morphology. In designing the grammar, a decision was made early on to err on the side of acceptance whenever there were con icting opinions as to whether or not a construction is grammatical. In this sense, the XTAG English grammar functions better as an acceptor rather than a generator of English sentences. The range of syntactic phenomena that can be handled is large and includes auxiliaries (including inversion), copula, raising and small clause constructions, topicalization, relative clauses, in nitives, gerunds, passives, adjuncts, it-clefts, wh-clefts, PRO constructions, noun-noun modi cations, extraposition, determiner phrases, genitives, negation, noun-verb contractions, sentential adjuncts and imperatives. The combination of large scale lexicons and wide phenomena coverage result in a robust system. 4.1 Subcategorization Frames Elementary trees for non-auxiliary verbs are used to represent the linguistic notion of subcategorization frames. The anchor of the elementary tree subcategorizes for the other elements that appear in the tree, forming a clausal or sentential structure. Tree families group together trees belonging to the same subcategorization frame. Consider the following uses of the verb buy: (1) Srini bought a book. (2) Srini bought Beth a book. In sentence (1), the verb buy subcategorizes for a direct object NP. The elementary tree anchored by buy is shown in Figure 4.1(a) and includes nodes for the NP complement of buy and for the NP subject. In addition to this declarative tree structure, the tree family also contains the trees that would be related to each other transformationally in a movement based approach, i.e passivization, imperatives, wh-questions, relative clauses, and so forth. Sentence (2) shows that buy also subcategorizes for a double NP object. This means that buy also selects the double NP object subcategorization frame, or tree family, with its own set of transformationally related sentence structures. Figure 4.1(b) shows the declarative structure for this set of sentence structures. 17 CHAPTER 4. UNDERVIEW Sr NP0↓ VP V bought NP1↓ Sr NP0↓ VP V bought NP1↓ NP2↓ (a) (b) Figure 4.1: Di erent subcategorization frames for the verb buy 4.2 Complements and Adjuncts Complements and adjuncts have very di erent structures in the XTAG grammar. Complements are included in the elementary tree anchored by the verb that selects them, while adjuncts do not originate in the same elementary tree as the verb anchoring the sentence, but are instead added to a structure by adjunction. The contrasts between complements and adjuncts have been extensively discussed in the linguistics literature and the classi cation of a given element as one or the other remains a matter of debate (see [Rizzi, 1990], [Larson, 1988], [Jackendo , 1990], [Larson, 1990], [Cinque, 1990], [Obernauer, 1984], [Lasnik and Saito, 1984], and [Chomsky, 1986]). The guiding rule used in developing the XTAG grammar is whether or not the sentence is ungrammatical without the questioned structure.1 Consider the following sentences: (3) Srini bought a book. (4) Srini bought a book at the bookstore. (5) Srini arranged for a ride. (6) Srini arranged. Prepositional phrases frequently occur as adjuncts, and when they are used as adjuncts they have the tree structure shown in Figure 4.2(a). This adjunction tree would adjoin into the tree shown in Figure 4.1(a) to generate sentence (4). There are verbs, however, such as arrange, hunger and di erentiate, that take prepositional phrases as complements. Sentences (5) and (6) clearly show that the prepositional phrase are not optional for these verbs. For these sentences, the prepositional phrase will be an initial tree (as shown in Figure 4.2(b)) that substitutes into an elementary tree, such as the one anchored by the verb arrange in Figure 4.2(c). Virtually all parts of speech, except for main verbs, function as both complements and adjuncts in the grammar. More information is available in this report on various parts of speech as complements: adjectives (e.g. section 6.13), nouns (e.g. section 6.2), and prepositions (e.g. section 6.11); and as adjuncts: adjectives (section 19.1), adverbs (section 19.4), nouns (section 19.2), and prepositions (section 19.3). 1Iteration of a structure can also be used as a diagnostic: Srini bought a book at the bookstore on Walnut Street for a friend. 18 VPr VP* PP P at NP↓ PP P for NP↓ Sr NP0↓ VP V arranged PP ↓ (a) (b) (c) Figure 4.2: Trees illustrating the di erence between Complements and Adjuncts 4.3 Non-S constituents Although sentential trees are generally considered to be special cases in any grammar, insofar as they make up a `starting category', it is the case that any initial tree constitutes a phrasal constituent. These initial trees may have substitution nodes that need to be lled (by other initial trees), and may be modi ed by adjunct trees, exactly as the trees rooted in S. Although grouping is possible according to the heads or anchors of these trees, we have not found any classi cation similar to the subcategorization frames for verbs that can be used by a lexical entry to `group select' a set of trees. These trees are selected one by one by each lexical item, according to each lexical item's idiosyncrasies. The grammar described by this technical report places them into several les for ease of use, but these les do not constitute tree families in the way that the subcategorization frames do. 4.4 Case Assignment 4.4.1 Approaches to Case 4.4.1.1 Case in GB theory GB (Government and Binding) theory proposes the following `case lter' as a requirement on S-structure.2 Case Filter Every overt NP must be assigned abstract case. [Haegeman, 1991] Abstract case is taken to be universal. Languages with rich morphological case marking, such as Latin, and languages with very limited morphological case marking, like English, are all presumed to have full systems of abstract case that di er only in the extent of morphological realization. In GB, abstract case is assigned to NP's by various case assigners, namely verbs, prepositions, and INFL. Verbs and prepositions are said to assign accusative case to NP's that they 2There are certain problems with applying the case lter as a requirement at the level of S-structure. These issues are not crucial to the discussion of the English XTAG implementation of case and so will not be discussed here. Interested readers are referred to [Lasnik and Uriagereka, 1988]. 19 CHAPTER 4. UNDERVIEW govern, and INFL assigns nominative case to NP's that it governs. These governing categories are constrained in where they can assign case by means of `barriers' based on `minimality conditions', although these are relaxed in `exceptional case marking' situations. The details of the GB analysis are beyond the scope of this technical report, but see [Chomsky, 1986] for the original analysis or [Haegeman, 1991] for an overview. Let it su ce for us to say that the notion of abstract case and the case lter are useful in accounting for a number of phenomena including the distribution of nominative and accusative case, and the distribution of overt NP's and empty categories (such as PRO). 4.4.1.2 Minimalism and Case A major conceptual di erence between GB theories and Minimalism is that in Minimalism, lexical items carry their features with them rather than being assigned their features based on the nodes that they end up at. For nouns, this means that they carry case with them, and that their case is `checked' when they are in SPEC position of AGRs or AGRo, which subsequently disappears ([Chomsky, 1992]). 4.4.2 Case in XTAG The English XTAG grammar adopts the notion of case and the case lter for many of the same reasons argued in the GB literature. However, in some respects the English XTAG grammar's implementation of case more closely resembles the treatment in Chomsky's Minimalism framework ([Chomsky, 1992]) than the system outlined in the GB literature ([Chomsky, 1986]). As in Minimalism, nouns in the XTAG grammar carry case with them, which is eventually `checked'. However in the XTAG grammar, noun cases are checked against the case values assigned by the verb during the uni cation of the feature structures. Unlike Chomsky's Minimalism, there are no separate AGR nodes; the case checking comes from the verbs directly. Case assignment from the verb is more like the GB approach than the requirement of a SPEC-head relationship in Minimalism. Most nouns in English do not have separate forms for nominative and accusative case, and so they are ambiguous between the two. Pronouns, of course, are morphologically marked for case, and each carries the appropriate case in its feature. Figures 4.3(a) and 4.3(b) show the NP tree anchored by a noun and a pronoun, respectively, along with the feature values associated with each word. Note that books simply gets the default case nom/acc, while she restricts the case to be nom. 4.4.3 Case Assigners 4.4.3.1 Prepositions Case is assigned in the XTAG English grammar by two components verbs and prepositions.3 Prepositions assign accusative case (acc) through their feature, which is linked directly to the feature of their objects. Figure 4.4(a) shows a lexicalized preposition tree, while Figure 4.4(b) shows the same tree with the NP tree from Figure 4.3(a) substituted 3For also assigns case as a complementizer. See section 8.5 for more details. 20 NP pron : <1> wh : <2> case : <3> nom/acc agr : <4> N pron : <1> wh : <2> case : <3> agr : <4> agr : 3rdsing : num : plur pers : 3 wh : books NP pron : <1> wh : <2> case : <3> nom/acc agr : <4> N agr : <4> case : <3> pron : <1> wh : <2> pron : + refl : case : nom poss : agr : gen : fem 3rdsing : + num : sing pers : 3 she (a) (b) Figure 4.3: Lexicalized NP trees with case markings into the NP position. Figure 4.4(c) is the tree in Figure 4.4(b) after uni cation has taken place. Note that the case ambiguity of books has been resolved to accusative case. 4.4.3.2 Verbs Verbs are the other part of speech in the XTAG grammar that can assign case. Because XTAG does not distinguish INFL and VP nodes, verbs must provide case assignment on the subject position in addition to the case assigned to their NP complements. Assigning case to NP complements is handled by building the case values of the complements directly into the tree that the case assigner (the verb) anchors. Figures 4.5(a) and 4.5(b) show an S tree4 that would be anchored5 by a transitive and ditransitive verb, respectively. Note that the case assignments for the NP complements are already in the tree, even though there is not yet a lexical item anchoring the tree. Since every verb that selects these trees (and other trees in each respective subcategorization frame) assigns the same case to the complements, building case features into the tree has exactly the same result as putting the case feature value in each verb's lexical entry. The case assigned to the subject position varies with verb form. Since the XTAG grammar treats the in ected verb as a single unit rather than dividing it into INFL and V nodes, case, along with tense and agreement, is expressed in the features of verbs, and must be passed in the appropriate manner. The trees in Figure 4.6 show the path of linkages that joins the feature of the V to the feature of the subject NP. The morphological form of 4Features not pertaining to this discussion have been taken out to improve readability and to make the trees easier to t onto the page. 5The diamond marker ( ) indicates the anchor(s) of a structure if the tree has not yet been lexicalized. 21 CHAPTER 4. UNDERVIEW PP assign-case : <1> wh : <2> P assign-case : <1> assign-case : acc of NP↓ case : <1> wh : <2> PP assign-case : <5> wh : <6> P assign-case : <5> assign-case : acc of NP case : <5> wh : <6> pron : <1> wh : <2> case : <3> agr : <4> N pron : <1> wh : <2> case : <3> nom/acc agr : <4> agr : 3rdsing : num : plur pers : 3 wh : books PP assign-case : <1> acc wh : <2> P assign-case : <1> of NP agr : <3> 3rdsing : num : plur pers : 3 pron : <4> case : <1> wh : <2> N pron : <4> wh : <2> case : <1> agr : <3> books (a) (b) (c) Figure 4.4: Assigning case in prepositional phrases Sr assign-case : <3> agr : <4> NP0↓ wh : case : <3> agr : <4> VP assign-case : <3> agr : <4> assign-case : <1> agr : <2> V◊ assign-case : <1> agr : <2> NP1↓ case : acc Sr assign-case : <3> agr : <4> NP0↓ wh : case : <3> agr : <4> VP assign-case : <3> agr : <4> assign-case : <1> agr : <2> V◊ assign-case : <1> agr : <2> NP1↓ case : acc NP2↓ case : acc (a) (b) Figure 4.5: Case assignment to NP arguments the verb determines the value of the feature. Figures 4.6(a) and 4.6(b) show the same tree6 anchored by di erent morphological forms of the verb sing, which give di erent values for the feature. The adjunction of an auxiliary verb onto the VP node breaks the link from the main V, replacing it with a link from the auxiliary verb instead.7 The progressive form of the verb in Figure 4.6(b) has the feature-value =none, but this is overridden 6Again, the feature structures shown have been restricted to those that pertain to the V/NP interaction. 7See section 20.1 for a more complete explanation of how this relinking occurs. 22 Sr assign-case : <3> agr : <4> NP0↓ wh : case : <3> agr : <4> VP assign-case : <3> agr : <4> assign-case : <1> agr : <2> V assign-case : <1> agr : <2> agr : pers : 3 num : sing 3rdsing : + assign-case : nom mode : ind sings NP1↓ case : acc Sr assign-case : <3> agr : <4> NP0↓ wh : case : <3> agr : <4> VP assign-case : <3> agr : <4> assign-case : <1> agr : <2> V assign-case : <1> agr : <2> assign-case : none mode : ger singing NP1↓ case : acc (a) (b) Figure 4.6: Assigning case according to verb form VPr agr : <1> assign-case : <2> V agr : <1> assign-case : <2> agr : pers : 3 num : sing 3rdsing : + assign-case : nom is VP* Sr assign-case : <3> agr : <4> NP0↓ wh : case : <3> agr : <4> VPr assign-case : <3> agr : <4> agr : <1> assign-case : <2> V agr : <1> assign-case : <2> agr : pers : 3 num : sing 3rdsing : + assign-case : nom is VP assign-case : <5> agr : <6> V assign-case : <5> agr : <6> assign-case : none mode : ger singing NP1↓ case : acc (a) (b) Figure 4.7: Proper case assignment with auxiliary verbs 23 CHAPTER 4. UNDERVIEW by the adjunction of the appropriate form of the auxiliary word be. Figure 4.7(a) shows the lexicalized auxiliary tree, while Figure 4.7(b) shows it adjoined into the transitive tree shown in Figure 4.6(b). The case value passed to the subject NP is now nom (nominative). 4.4.4 PRO in a uni cation based framework Most forms of a verb assign nominative case, although some forms, such as past participle, assign no case whatsoever. This is di erent than assigning case none, as the progressive form of the verb sing does in Figure 4.6(b). The distinction of a case none from no case is indicative of a divergence from the standard GB theory. In GB theory, the absence of case on an NP means that only PRO can ll that NP. With feature uni cation as is used in the FB-LTAG grammar, the absence of case on an NP means that any NP can ll it, regardless of its case. This is due to the mechanism of uni cation, in which if something is unspeci ed, it can unify with anything. Thus we have a speci c case none to handle verb forms that in GB theory do not assign case. PRO is the only NP with case none. Verb forms that assign no case, as the past participle mentioned above, can do so because they cannot occur without an auxiliary verb which takes care of the case assignment. Note that although we are drawn to this treatment by our use of uni cation for feature manipulation, our treatment is very similar to the assignment of null case to PRO in [Chomsky and Lasnik, 1993]. [Watanabe, 1993] also proposes a very similar approach within Chomsky's Minimalist framework.8 8See section 8.1 for additional discussion of PRO. 24 Part II Verb Classes 25 Chapter 5 Where to Find What The two page table that follows gives an overview of what types of trees occur in various tree families with pointers to discussion in this report. An entry in a cell of the table indicates that the tree(s) for the construction named in the row header are included in the tree family named in the column header. Entries are of two types. If the particular tree(s) are displayed and/or discussed in this report the entry gives a page number reference to the relevant discussion or gure.1 Otherwise, a p indicates inclusion in the tree family but no gure or discussion related speci cally to that tree in this report. Blank cells indicate that there are no trees for the construction named in the row header in the tree family named in the column header. The table below gives the expansion of abbreviations in the table headers. 1Since Chapter 6 has a brief discussion and a declarative tree for every tree family, page references are given only for other sections in which discussion or tree diagrams appear. 27 CHAPTER 5. WHERE TO FIND WHAT Abbreviation Full Name Sentential Comp. with NP Sentential Complement with NP Ditrans. Light Verbs w. PP Shift Ditransitive Light Verbs with PP Shift Ditrans. Light Verbs w/o PP Shift Ditransitive Light Verbs without PP Shift Adj. Sm. Cl. w. Sentential Subj. Adjective Small Clause with Sentential Subject NP Sm. Clause w. Sentential Subj. NP Small Clause with Sentential Subject PP Sm. Clause w. Sentential Subj. PP Small Clause with Sentential Subject Y/N question Yes/No question Wh-mov. NP complement Wh-moved NP complement Wh-mov. S comp. Wh-moved S complement Wh-mov. Adj comp. Wh-moved Adjective complement Wh-mov. object of a P Wh-moved object of a P Wh-mov. PP Wh-moved PP Topic. NP complement Topicalized NP complement Det. gerund Determiner gerund Rel. cl. on NP comp. Relative clause on NP complement Rel. cl. on PP comp. Relative clause on PP complement Rel. cl. on NP object of P Relative clause on NP object of P Pass. with wh-moved subj. Passive with wh-moved subject (with and without by phrase) Pass. w. wh-mov. ind. obj. Passive with wh-moved indirect object (with and without by phrase) Pass. w. wh-mov. obj. of the by phrase Passive with wh-moved object of the by phrase Pass. w. wh-mov. by phrase Passive with wh-moved by phrase Cl. S mod. (decl.) Clausal S modi er (declarative) Cl. VP mod. (decl.) Clausal VP modi er (declarative) Cl. S mod. (pass., w. by phrase) Clausal S modi er (passive, with by phrase) Cl. VP mod. (pass., w. by phrase) Clausal VP modi er (passive, with by phrase) Cl. S mod. (pass., w.out by phrase) Clausal S modi er (passive, without by phrase) Cl. VP mod. Clausal VP modi er Adj. Sm. Cl. w. Sent. Comp. Adjective Small Clause with Sentential Complement NP Sm. Cl. w. Sent. Comp. NP Small Clause with Sentential Complement PP Sm. Cl. w. Sent. Comp. PP Small Clause with Sentential Complement 28

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Founded by Benjamin Franklin in 1740 The Institute For Research In Cognitive Science Lexical Acquisition as Constraint Satisfaction

In this paper we present a computational study of lexical acquisition We attempt to characterize the lexical acquisition task faced by children by de ning a simpli ed formal approximation of this task which we term the mapping problem We then present a novel strategy for solving large instances of this mapping problem This strategy is capable of learning the word to meaning mappings for as many...

متن کامل

Founded by Benjamin Franklin in 1740 The Institute For Research In Cognitive Science Progressive Horizon Planning - Planning Exploratory

Much planning research assumes that the goals for which one plans are known in advance. That is not true of trauma management, which involves both a search for relevant goals and reasoning about how to achieve them. TraumAID is a consultation system for the diagnosis and treatment of multiple trauma. It has been under development jointly at the University of Pennsylvania and the Medical College...

متن کامل

Founded by Benjamin Franklin in 1740 The Institute For Research In Cognitive Science Domain - Independent Queries on Databases with External Functions

We investigate queries in the presence of external functions with arbitrary inputs and outputs (atomic values, sets, nested sets etc). We propose a new notion of domain independence for queries with external functions which, in contrast to previous work, can also be applied to query languages with xpoints or other kinds of iterators. Next, we deene two new notions of computable queries with ext...

متن کامل

Franklin in 1740 The Institute For Research In Cognitive Science

This paper presents a syntactic lexicon for English that was originally derived from the Oxford Advanced Learner s Dictionary and the Oxford Dictionary of Current Idiomatic English and then modi ed and augmented by hand There are more than syn tactic entries from all parts of speech An X windows based tool is available for main taining the lexicon and performing searches C and Lisp hooks are al...

متن کامل

Founded by Benjamin Franklin in 1740 The Institute For Research In Cognitive

This paper presents a new algorithm for object pose and shape estimation from multiple views. Using a qualitative shape recovery scheme we first segment the image into parts which belong to a vocabulary of primitives. Based on the additional constraints provided by the qualitative shapes we extend our physics-based framework to allow object pose and shape estimation from stereo images where the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1990